AITopics | action gap

Collaborating Authors

action gap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

55769e1208c7f45e9acc98f06279c10c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 15:17:02 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Banking & Finance (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

a44ba9086b2b83ccf2baf7c678723449-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 08:58:47 GMT

bellman operator, operator, sequence, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States (0.14)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning Harley Wiltzer Mila-Québec AI Institute McGill University Marc G. Bellemare

Neural Information Processing SystemsOct-10-2025, 02:56:00 GMT

In addition, we build a superiority-based DRL algorithm.

action gap, algorithm, theorem 3, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.40)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Banking & Finance (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Family of Robust Stochastic Operators for Reinforcement Learning

Neural Information Processing SystemsOct-3-2025, 09:57:58 GMT

In this paper we complement these previous studies of operator alternatives and focus on operators that seek to achieve greater robustness w.r.t.

bellman operator, operator, sequence, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States (0.14)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

a44ba9086b2b83ccf2baf7c678723449-AuthorFeedback.pdf

Neural Information Processing SystemsOct-3-2025, 09:57:44 GMT

action gap, convergence, thm 3, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence (0.30)

Add feedback

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

Harm Van Seijen, Mehdi Fatemi, Arash Tavakoli

Neural Information Processing SystemsAug-20-2025, 08:18:56 GMT

Neural Information Processing Systems http://nips.cc/

discount factor, q-learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Industry:

Leisure & Entertainment (0.68)
Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Wiltzer, Harley, Bellemare, Marc G., Meger, David, Shafto, Patrick, Jhaveri, Yash

arXiv.org Machine LearningOct-14-2024

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases. We quantify the rate of collapse of these return distributions and exhibit that their statistics collapse at different rates. Moreover, we define distributional perspectives on action gaps and advantages. In particular, we introduce the superiority as a probabilistic generalization of the advantage -- the core object of approaches to mitigating performance issues in high-frequency value-based RL. In addition, we build a superiority-based DRL algorithm. Through simulations in an option-trading domain, we validate that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.

action gap, assumption 2, theorem 3, (15 more...)

arXiv.org Machine Learning

2410.11022

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning

van Seijen, Harm, Fatemi, Mehdi, Tavakoli, Arash

arXiv.org Machine LearningJun-3-2019

In an effort to better understand the different ways in which the discount factor affects the optimization process in reinforcement learning, we designed a set of experiments to study each effect in isolation. Our analysis reveals that the common perception that poor performance of low discount factors is caused by (too) small action-gaps requires revision. We propose an alternative hypothesis, which identifies the size-difference of the action-gap across the state-space as the primary cause. We then introduce a new method that enables more homogeneous action-gaps by mapping value estimates to a logarithmic space. We prove convergence for this method under standard assumptions and demonstrate empirically that it indeed enables lower discount factors for approximate reinforcement-learning methods. This in turn allows tackling a class of reinforcement-learning problems that are challenging to solve with traditional methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1906.00572

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.68)
Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A General Family of Robust Stochastic Operators for Reinforcement Learning

Lu, Yingdong, Squillante, Mark S., Wu, Chai Wah

arXiv.org Machine LearningMay-28-2019

We consider a new family of operators for reinforcement learning with the goal of alleviating the negative effects and becoming more robust to approximation or estimation errors. Various theoretical results are established, which include showing on a sample path basis that our family of operators preserve optimality and increase the action gap. Our empirical results illustrate the strong benefits of our family of operators, significantly outperforming the classical Bellman operator and recently proposed operators.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1805.08122

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Increasing the Action Gap: New Operators for Reinforcement Learning

Bellemare, Marc G. (Google DeepMind) | Ostrovski, Georg (Google DeepMind) | Guez, Arthur (Google DeepMind) | Thomas, Philip S. (Google DeepMind) | Munos, Remi (Google DeepMind)

AAAI ConferencesApr-19-2016

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom > England (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback